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Abstract. A linear inverse problem is proposed that requires the determination of multiple 
unknown signal vectors. Each unknown vector passes through a different system matrix and the 
results are added to yield a single observation vector. Given the matrices and lone observation, 
the objective is to find a simultaneously sparse set of unknown vectors that solves the system. We 
will refer to this as the multiple- system single- output (MSSO) simultaneous sparsity problem. This 
manuscript contrasts the MSSO problem with other simultaneous sparsity problems and conducts a 
thorough initial exploration of algorithms with which to solve it. 

Seven algorithms are formulated that approximately solve this NP-Hard problem. Three greedy 
techniques are developed (matching pursuit, orthogonal matching pursuit, and least squares matching 
pursuit) along with four methods based on a convex relaxation (iteratively reweighted least squares, 
two forms of iterative shrinkage, and formulation as a second-order cone program). While deriving the 
algorithms, we prove that seeking a single sparse complex-valued vector is equivalent to seeking two 
simultaneously sparse real-valued vectors. In other words, single- vector sparse approximation of a 
complex vector readily maps to the MSSO problem, increasing the applicability of MSSO algorithms. 

The algorithms are evaluated across three experiments: the first and second involve sparsity 
profile recovery in noiseless and noisy scenarios, respectively, while the third deals with magnetic 
resonance imaging radio-frequency excitation pulse design. For sparsity profile recovery, the itera- 
tively reweighted least squares and second-order cone programming techniques outperform the greedy 
algorithms, whereas in the magnetic resonance imaging pulse design experiment, the greedy least 
squares matching pursuit algorithm exhibits superior performance. Overall, each algorithm is found 
to have particular merits and weaknesses, e.g., the iterative shrinkage techniques converge slowly, but 
because they update only a subset of the overall solution per iteration rather than all unknowns at 
once, they are useful in cases where attempting the latter is prohibitive in terms of system memory. 

Key words, sparse approximation, simultaneous sparse approximation, multiple unknown vec- 
tors, greedy pursuit, iteratively reweighted least squares, iterative shrinkage, second-order cone pro- 
gramming 

1. Introduction. In this work we propose a linear inverse problem that requires 
a simultaneously sparse set of vectors as the solution, i.e., a set of vectors where only 
a small number of each vector's entries are nonzero, and where the vectors' sparsity 
profiles (the locations of the nonzero entries) are equivalent (or are promoted to be 
equivalent with an increasing penalty given otherwise). 

Prior work on simultaneously sparse solutions to linear inverse problems involves 
multiple unknown vectors, a single system matrix, and a host of observation vectors; 
the pth observation vector arises by multiplying the single system matrix with the pth 
unknown vector [7,33,47,49]. Given the observation vectors and system matrix, one 
seeks out a simultaneously sparse set of unknown vectors that (approximately) solves 
the overall system. We will refer to this as the single-system multiple- output (SSMO) 
simultaneous sparsity problem. 

Here we study a problem that is somewhat similar yet different from the aforemen- 
tioned one. This multiple-system single-output (MSSO) simultaneous sparsity problem 
still consists of multiple unknown vectors, but now each such vector is passed through 
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a different system matrix and the outputs of the various system matrices undergo a 
linear combination, yielding only one observation vector. Given the matrices and lone 
observation, one must find a simultaneously sparse set of vectors that (approximately) 
solves the system. To date, this problem has been explored in a magnetic resonance 
imaging (MM) radio-frequency (RF) excitation pulse design context [51-53], but it 
may also have applications to source localization using sensor arrays [26,30] and signal 
denoising [4,11,16,20]. 

Both SSMO and MSSO arise as generalizations of the single-system single-output 
(SSSO) sparse approximation problem, where there is one known observation vector, 
a known system matrix, and the solution one seeks is a single sparse unknown vector 
[4,23,40]. Several styles of algorithms have been posed to solve the SSSO problem, 
such as forward-looking greedy pursuit [3,6,10,34,35], iteratively reweighted least 
squares (IRLS) [28] (e.g., FOCUSS [23]), iterative shrinkage [8,14-16], and second- 
order cone programming (SOCP) [2,33]. Many of these have been extended to the 
SSMO problem [7, 33 , 47, 49] . 

In this manuscript, we propose three forward-looking greedy techniques — matching 
pursuit (MP) [34], orthogonal matching pursuit (OMP) [3,6,35], and least squares 
matching pursuit (LSMP) [6] — and also develop IRLS-based, shrinkage-based, and 
SOCP-based algorithms to solve the MSSO simultaneous sparsity problem. We then 
evaluate the performance of the algorithms across three experiments: the first and 
second involve sparsity profile recovery in noiseless and noisy scenarios, respectively, 
while the third deals with MRI RF excitation pulse design. 

The structure of this paper is as follows: in Sec. 2, we provide background informa- 
tion about ordinary (SSSO) sparse approximation and SSMO sparse approximation. 
In Sec. 3, we formulate the MSSO problem. Seven algorithms for solving the prob- 
lem are then posed in Sec. 4, while the details and results of the three numerical 
experiments appear in Sec. 5. Section 6 highlights the strengths and weaknesses of 
the algorithms and presents ideas for future work. Concluding remarks are given in 
Sec. 7. 

2. Background. 

2.1. Single-System Single-Output (SSSO) Sparse Approximation. Con- 
sider a linear system of equations d = Fg, where d G C M , F G C MxJV , g G C^, and 
d and F are known. It is common to use the Moore-Penrose pseudoinverse of F, 
denoted F^, to determine g = FM as an (approximate) solution to the system of 
equations. When d is in the range of F, g is the solution that minimizes ||g||2) the 
Euclidean or £ 2 norm of g. When d is not in the range of F, no solution exists; g 
minimizes ||g||2 among the vectors that minimize ||d — Fg||2- 

When a sparse solution is desired, it is necessary for the analogue to g to have 
only a small fraction of its entries differ from zero. We are faced with a sparse 
approximation problem of the form 

min ||g|| subject to ||d - Fg|| 2 < e, (2.1) 
g 

where || • ||o denotes the number of nonzero elements of a vector. The subset of 
{1, 2, . . . , N} where there are nonzero entries in g is called the sparsity profile. For 
general F, solving (2.1) essentially requires a search over up to 2 N — 1 nonempty 
sparsity profiles. The problem is thus computationally infcasiblc except for very small 
systems of equations (e.g., even for N = 30, one may need to search 1,073,741,823 
subsets). Formally, the problem is NP-Hard [9,35]. 
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For problems where (2.1) is intractable, a large body of work supports a greedy 
search over the columns of F to seek out a small subset of columns that, when weighted 
and linearly combined, yields a result that is close to d in the li sense, along with a 
sparse g [3,6,10,34,35]. 

A second body of research supports the relaxation of (2.1) to find a sparse g [4]: 

minHgll! s.t. ||d-Fg|| 2 < e. (2.2) 
g 

This is a convex optimization and thus may be solved efficiently [2]. The solution 
of (2.2) does not always match the solution of (2.1) — if it did, the intractability of 
(2.1) would be contradicted — but certain conditions on F guarantee a proximity of 
their solutions [12,13,48]. Note that (2.2) applies an l\ norm to g, but an £ p norm 
(where p < 1) may also be used to promote sparsity [4,23]; this leads to a non-convex 
problem and will not considered in this paper. 
The optimization 



mm 
g 



{Hld-FglH + AHgllx} (2.3) 



has the same set of solutions as (2.2). The first term of (2.3) keeps residual error 
down, whereas the second promotes sparsity of g [4,43]. As the control parameter, 
A, is increased from zero to infinity, the algorithm yields sparser solutions and the 
residual error increases; sparsity is traded off with residual error. In this paper we 
shall hereafter use formulation (2.3) and its analogues rather than (2.2). 

It is important to understand that a problem of the form (2.3) may arise as a 
proxy for (2.1) or as the inherent problem of interest. For example, in a Bayesian 
estimation setting, (2.3) yields the maximum a posteriori probability estimate of g 
from d when the observation model involves F and Gaussian noise and the prior on g is 
Laplacian. Similar statements can be made about the relaxations of the simultaneous 
sparse approximation problems posed in the following sections. 

2.2. Single-System Multiple-Output (SSMO) Simultaneous Sparse Ap- 
proximation. In SSMO, we have P observation vectors ("snapshots"), all of which 
arise from the same system matrix: 

d p =Fg p , forp=l,...,P, (2.4) 

where d p £ C M is known for p = 1, . . . , P along with F £ C MxN . In this scenario, we 
want to constrain the number of positions at which any of the g p s are nonzero. Thus 
we seek approximate solutions in which the g p s are not only sparse, but the union 
of their sparsity patterns is small; that is, a simultaneously sparse set of vectors is 
desired [33,47]. Unfortunately, optimal approximation with a simultaneous sparsity 
constraint is even harder than (2.1). 

Extending single-vector sparse approximation greedy techniques is one way to 
find an approximate solution [7,49]. Another approach is to extend the relaxation 
(2.3) as follows: 



mm 

G 



{±||D-FG||2 +A||G|| S }, (2.5) 

where D = [d 1; . . . , d P ] £ C MxP , G = [ gl , . . . ,g P ] £ C NxP , || • || F is the Frobenius 
norm, and 



N 



|G||s = E 



JV 



£|G(n,p)|2 = £ 



1 \ p=l n=l \ 



£|g>]| 2 , ( 2 - 6 ) 

p=l 
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i.e., || G | js is the l\ norm of the £2 norms of the rows of the G matrix. 1 This latter 
operator is a simultaneous sparsity norm: it penalizes the program (produces large 
values) when the columns of G have dissimilar sparsity profiles [33]. Fixing A to a 
sufficiently large value and solving this optimization yields simultaneously sparse g p s. 
For P = 1, (2.5) collapses to the base case of (2.3). Given the convex objective function 
in (2.5), one may then attempt to find a solution that minimizes the objective using an 
IRLS-based [7], or SOCP-based [33] approach. A formal analysis of the minimization 
of the convex objective may be found in [47]. 

3. Multiple-System Single-Output (MSSO) Simultaneous Sparse Ap- 
proximation. We outline the MSSO problem in a style analogous to that of SSMO 
systems in (2.4, 2.5) and then pose a second formulation that is useful for deriving 
several algorithms in Sec. 4. 

3.1. MSSO Problem Formulation. Consider the following system: 
d = Fi gl + --- + F P gp 

(3.1) 



gp 



Ftotgto 



where d G C M and the F p G C MxN arc known. Unlike the SSMO problem, there is 
now only one observation and P different system matrices. Here we again desire an 
approximate solution where the g p s are simultaneously sparse; formally, 

min ||d — Ftotgtotlb s.t. the simultaneous if -sparsity of the g„s. (3.2) 
Si> - - - >gp 

This is, of course, harder than the SSSO problem and thus NP-Hard. To keep the 
problem as general as possible, there are no constraints on the values of M, A, or P, 
i.e., there is no explicit requirement that the system be overdetermined or underde- 
termined. Further, we have used complex-valued rather than real-valued variables. 

In the first half of Sec. 4, we will pose three approaches that attempt to solve the 
MSSO problem (3.2) in a greedy fashion. Another approach to solve the problem is 
to apply a relaxation similar to (2.3, 2.5): 



mm 

G 



{i||d-F tot g tot ||^ + A||G|| s }, (3.3) 



where G and ||G||s are the same as in (2.5) and (2.6), respectively. In the second half 
of Sec. 4, we will outline four algorithms for solving this relaxed problem. By setting 
P = 1, (3.3) collapses to the base case of (2.3). 

3.2. Alternate Formulation of the MSSO Problem. In several upcoming 
derivations, it will be useful to view the system from a different standpoint. To begin, 
we construct several new variables that are simply permutations of the F p s and g p s. 
First we define A new matrices: 

C n = [f hn , f Pi „] G C MxP , for n = 1, . . . , A, (3.4) 



1 Although here we have applied an i\ norm to the £2 row energies of G, an £ p norm (where 
p < 1) could be used in place of the £\ norm if one is willing to deal with a non-convex objective 
function. Further, an £ q norm (where q > 2) rather than an £2 norm could be applied to each row of 
G because the purpose of the row operation is to collapse the elements of the row into a scalar value 
without introducing a sparsifying effect. 
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where f Pi „ is the nth column of F p . Next we construct N new vectors: 

h„ = [ gl [n],...,g P [n]] T eC p , forn = l,...,JV, (3.5) 

where g p [n] is the nth element of g p and T is the transpose operation. Given the C„s 
and h„s, we now pose the following system: 



d = Cihi H h CArhAr 

[Ci---Cjv] ' 



(3.6) 

Ctothtot- 




Due to (3.4, 3.5), the system in (3.6) is equivalent to the one in (3.1). The relationship 
between the g p s and h„s implies that if we desire to find a set of simultaneously sparse 
g p s to solve (3.1), we should seek out a set of h„s where many of the h„s equal an 
all-zeros vector, 0, but a few h„s are high in £2 energy (typically with all elements 
being nonzero). This claim is apparent if we consider the fact that H = [hi, . . . , hjv] 
is equal to the transpose of G, and that the g p s are only simultaneously sparse when 
|| G ||s is sufficiently small. 

Continuing with this alternate formulation, and given our desire to find a solution 
where most of the h„s are all-zero vectors and a few are dense, we relax the problem 
as follows: 

N \ 

-Ctothtotlla + A^llhnlh • (3-7) 

n=l J 

Fixing A to a sufficiently large value and solving this optimization yields many low- 
energy h„s (each close to 0), along with several dense high-energy h„s. Further, 
because X)n=i ll^nlh is equivalent to ||G||s, this means (3.7) is equivalent to (3.3), 
and thus just like (3.3), the approach in (3.7) finds a set of simultaneously sparse g p s. 

3.3. Differences between the SSMO and MSSO Problems. In the SSMO 
problem, we see from (2.4) that there are many different ds and a single F. The ratio 
of unknowns to knowns always equals N/M regardless of the number of observations, 
P. A large P when solving SSMO is actually beneficial because the simultaneous spar- 
sity of the underlying g p s becomes more exploitable; empirical evidence of improved 
sparsity profile recovery with increasing P may be found in both [7] and [33] . 

In contrast, we see from (3.1) that in the MSSO problem there is a single d and 
many different Fs. Here the ratio of unknowns to knowns is no longer constant with 
respect to P; rather it is equal to PN/M. We will show in Sec. 5 that as P increases, 
the underlying simultaneous sparsity of the g p s is not enough to combat the increasing 
number of unknowns, and that for large P, correctly identifying the sparsity profile 
of the underlying unknown g p s is a daunting task. 

4. Proposed Algorithms. We now derive algorithms to (approximately) solve 
the MSSO problem defined in Sec. 3. 

4.1. Matching Pursuit (MP). To begin, we extend the single-vector case of 
matching pursuit [34] to an MSSO context. The classic MP technique first finds 
the column of the system matrix that best matches with the observed vector and 
then removes from the observation vector the projection of this chosen column. It 
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proceeds to select a second column of the system matrix that best matches with the 
residual observation, and continues doing so until either K columns have been chosen 
(as specified by the user) or the residual observation ends up as a vector of all zeros. 
This algorithm is fast and computationally-efficient because the best-matching column 
vector during each iteration is determined simply by calculating the inner product of 
each column vector with the residual observation and ranking the resulting inner 
product magnitudes. 

Now let us consider the MSSO system as posed in (3.6). In the first iteration 
of standard MP, we seek out the single column of the system matrix that can best 
represent d. But in the MSSO context, we need to seek out which of the N C„ matrices 
can be best used to represent d when the columns of C„ undergo an arbitrarily- 
weighted linear combination. The key difference here is that on an iteration-by- 
iteration basis, we arc no longer deciding which column vector best represents the 
observation, but which matrix does so. The intuition behind this approach is that 
ideal solutions should consist of only a few dense h„s and many all-zeros h„s. For 
the fcth iteration of the algorithm, we need to select the proper index n G {1, . . . , N} 
by solving: 

q k = argmin min ||r fe _i - C„h„|||, (4.1) 

n h„ 

where q k is the index that will be selected and rk-i is the current residual observation. 
For fixed n, the solution to the inner minimization is obtained via the pseudoinverse, 
hT = C* yielding 

qk = argmin min ||r fe _i - C„(C] l rfc_i)||2 = argmax rj^CnC^i-fc-i, (4.2) 

n h n n 

where H is the Hermitian transpose. From (4.2) we see that, analogously to standard 
MP, choosing the best index for iteration k involves computing and ranking a series 
of inner-product-like quadratic terms. 

Algorithm 1 outlines the entire procedure. After K iterations, one obtains Ik C 
{1,...,N} (of cardinality T < K), a set indicating the chosen C„ matrices. The 
weights to apply to each chosen matrix (i.e., the corresponding h„s) are obtained via 
a finalization step; they are the best weightings in the £2 residual error sense with 
which to linearly combine the columns of the chosen C„ matrices to best match the 
observation d. Since T total matrices end up being chosen by the process, there is no 
penalty in retuning the T associated h„ vectors because they are allowed to be dense. 
The N — T other C„s (and corresponding h„s) are not used. 2 

One property of note is that if M < P, Algorithm 1 stops after one iteration. This 
is because C„C„ in this case is simply an M x M identity matrix for all n G {1, . . . , N}, 
and thus any one of the C„s is enough to represent d exactly when its columns are 
properly weighted and linearly combined. 

4.2. Orthogonal Matching Pursuit (OMP). In single- vector MP, the resid- 
ual r k always ends up orthogonal to the fcth column of the system matrix, but as the 
algorithm continues iterating, there is no guarantee the residual remains orthogonal 
to column k or is minimized in the least-squares sense with respect to the entire set 

2 From the perspective of F p s and g p s in (3.1), Algorithm 1 determines weights to place along 
only T rows of G (leaving the other N — T rows zeroed out) that still yields a good approximation 
of d in the £2 residual error sense. It is seeking out the best rows of G which, when densely filled, 
yield a sound approximation of d. 
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Algorithm 1 — MSSO Matching Pursuit 

Task: greedily choose up to K of the C n s to best represent d via Cihi + ■ • • + Cjvhjv. 
Data and Parameters: d and C n ,n € {1, . . . ,N} are given. K iterations. 
Precompute: Q n = C„C]„ for n € {1, ... , N}. 
Initialize: Set k = 0, r = d, Jo = 0, So = [ ]. 

Iterate: Set k — 1 and apply: 

• q k = argmax„ r^_ 1 Q„r fc _i. 

• if q k f£ h-i then 

h = h-i U {qk} 
S k = [Sfc_i,C QJk ] 
else 

Ik = Ik-1 

Sk = Sk-i 
end if 

• Tk = r fe _i - Q gfc r fe _i. 

• k — k + 1. Terminate loop if k > K or r k — 0. ix ends with T < K elements. 
Compute Weights: x = S^d, unstack x into h 91 , . . . ,h qT ; set remaining h n s to 0. 



of k chosen column vectors (indexed by q\,...,qk). Furthermore, K iterations of 
single-vector MP do not guarantee K different columns will be selected. Single-vector 
OMP is an extension to MP that attempts to mitigate these problems by improving 
the calculation of the residual vector. During the fcth iteration of single-vector OMP, 
column q k is chosen exactly as in MP (by ranking the inner products of the residual 
vector r/j_i with the various column vectors), but the residual vector is updated by 
accounting for all columns chosen up through iteration k rather than simply the last 
one [6,35]. 

To extend OMP to the MSSO problem, we choose matrix qk during iteration k as 
in MSSO MP and then in the spirit of single-vector OMP compute the new residual 
as follows: 

r fc =d-S fe (S^d), (4.3) 

where = [C 9l , . . . , C g J and Sj^d is the best choice of x that minimizes the residual 
error ||d — Sfex||2- That is, to update the residual we now employ all chosen matrices, 
weighting and combining them to best represent d in the least-squares sense, yielding 
an rfc that is orthogonal to the columns of Sk (and thus orthogonal to C 9l , . . . , C Qk ), 
which has the benefit of ensuring that OMP will select a new C„ matrix at each step. 

Algorithm 2 describes the OMP algorithm; the complexity here is moderately 
greater than that of MP because the pseudoinversion of an M x Pk matrix is required 
during each iteration k. 

4.3. Least Squares Matching Pursuit (LSMP). Beyond OMP there exists a 
greedy algorithm with an even greater computational complexity known as LSMP. In 
single- vector LSMP, one solves N — k + 1 least squares minimizations during iteration 
k in order to determine which column of the system matrix may be used to best 
represent d [6]. 

Thus to extend LSMP to MSSO systems, we must ensure that during iteration k 
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Algorithm 2 — MSSO Orthogonal Matching Pursuit 

Task: greedily choose up to K of the C n s to best represent d via Cihi + ■ ■ ■ + Cjvhjv. 
Data and Parameters: d and C„, n € {1, . . . , N} are given. K iterations. 
Precompute: Q n = C„C!„ for n € {1, ... , N}. 
Initialize: Set k = 0, ro = d, Io = 0, So = [ ]. 

Iterate: Set k = 1 and apply: 

• q k = argmax„ g J)k _ l r^ i _ 1 Q n r fe _i. 

• I k = Ifc_i U {g fc } 

• Sfc = [Sfc-i, C 9fc ] 

• r fc = d-S fc Sj:d. 

• fc = fc + 1. Terminate loop if k > K or r k = 0. Jx ends with T < K elements. 
Compute Weights: x = S^d, unstack x into h qi , . . . , h qT ; set remaining h n s to 0. 



we account for the k — 1 previously chosen C„ matrices when choosing the fcth one 
to best construct an approximation to d. Specifically, index qk is selected as follows: 

q k = argmin min ||S^ n) x - (4.4) 

n€{l,...,JV},ngJ fc _i x 

where Ik-i is the set of indices chosen up through iteration k — 1 , Sj^ = [Sfc_i, C n ], 
Sfc_i = [C qi , ■ ■ ■ , C 9fe _ 1 ], and x G C pfc . For fixed n, the solution of the inner iteration 

is x opt = (Sj^)td; it is this step that ensures the residual observation error will be 
minimized by using all chosen matrices. Substituting x opt into (4.4) and simplifying 
the expression yields 

q k = argmax d H Q^ l) d, (4.5) 

n <£ I k -i 

where Q W = (S^)(S^)t- 

Algorithm 3 describes the LSMP method. The complexity here is much greater 
than that of OMP because N — k + 1 pscudoinversions of an M x Pk matrix are 
required during each iteration k. Furthermore, the dependence of on both n and 
k makes precomputing all such matrices infcasiblc in most cases. One way to decrease 
computation and runtime might be to extend the projection-based recursive updating 
scheme of [6] to MSSO LSMP. 

4.4. Iteratively Reweighted Least Squares (IRLS). Having posed three 
greedy approaches for solving the MSSO problem, we now turn our attention toward 
minimizing (3.7), the relaxed objective function. Here, the regularization term A is 
used to trade off simultaneous sparsity with residual observation error. 

One way to minimize (3.7) is to use an IRLS-based approach [28]. To begin, 
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Algorithm 3 — MSSO Least Squares Matching Pursuit 

Task: greedily choose K of the ds to best represent d via Cihi + ■ ■ ■ + Cjvhjv. 

Data and Parameters: d and C„, n 6 {1, ... , iV} are given. K iterations. 

Initialize: Set fe = 0, I = 0, So = [ ]. 

Iterate: Set k = 1 and apply: 

. q k = argmax„ g Jfc _ 1 d H (S< n) )(S<*°)td, where S<"> = [S fc _i,C„] 

• I k = h-i U {qk} 

• Sfc = [Sfc_l, Cq k ] 

• k = k + 1. Terminate loop if fc > K or rfc = 0. Ik ends with T < K elements. 
Compute Weights: x = S^d, unstack x into h 91 , . . . , h 9T ; set remaining h n s to 0. 



consider manipulating the right-hand term of (3.7) as follows: 

*2JM2-A2^ |iT-N--A2^ ^r-r 



n=l 



Jh„||2 ^ ll h n||2 

n— 1 11 11 n— 1 11 11 



A* 



-^[h;[i],...,K[p]] 



n=l 

AT 



2 

l|h„|| 2 +e 



l|2+e J 



h„[l] 
h„[P] 



- n v h"w h 



n=l 



Wi 



w 



AT 



hi 
h w 



^ h tot w toth to t, 



(4.6) 



where * is the complex conjugate of a scalar, W„ is a P x P real-valued diagonal 
matrix whose diagonal elements each equal 2/(||h„|| 2 + e), and e is some small non- 
negative value introduced to mitigate poor conditioning of the W„s. If we hx W to t £ 
^pnxpn ky com p U ti n g it using some prior estimate of h tot , then the right-hand 
term of (3.7) becomes a quadratic function and (3.7) transforms into a Tikhonov 
optimization [44,45]: 



mm 

h tot 



{in- 



Ctothtotlb + ■fh^ W to t htot | • 



(4.7) 



Finally, by performing a change of variables and exploiting the properties of Wtot, 
we can convert (4.7) into an expression that may be minimized using the robust 
and reliable conjugate-gradient (CG) least-squares solver LSQR [37,38], so named 
because it applies a QR decomposition [22] when solving the system in the least- 
squares sense. LSQR works better in practice than several other CG methods [1] 
because it restructures the input system via the Lanczos process [31] and applies a 
Golub-Kahan bidiagonalization [21] prior to solving it. 

1/2 

To apply LSQR to this problem, we first construct W tot as the element-by- 
element square-root of the diagonal matrix W to t and then take its inverse to obtain 



10 



A. C. ZELINSKI ET AL. 



Algorithm 4 — MSSO Iteratively Rcwcighted Least Squares 

Task: Minimize { | |jd — Ctothtot \\\ + A X^=i ll n "-ll 2 } using an iterative scheme. 

Data and Parameters: A, d, Ctot, <5, and K are given. 

Initialize: Set k — and h to t,fc=o = (Ctot^d (or e.g. h to t,)c=o = !)• 

Iterate: Set k = 1 and apply: 

• Create W to t from h to t,fc-i; construct W^ 2 , W^ 1 / 2 , and let A = CtotW^ 1 / 2 . 

• Obtain q tmp by using LSQR to solve min q {||d — Aq||| + A||q|||}. 

• Set htot.tmp = W~y 2 q tmp . 

• Line search: find fj,o £ [0, 1] such that (1 — /u)htot,fc-i + ^htot.tmp minimizes (3.7). 

• Set htot.fc = (1 — A t o)htot,fc-i + Mohtot,tmp- 

• k = k + 1. Terminate loop when k > K or (3.7) decreases by less than S. 
Finalize: Unstack the last htot solution into hi, . . . , hjv. 

w tot /2 - Defining q = W^ t 2 h tot and A = CtotW^ 2 , (4.7) becomes: 

mm {||d-Aq||2 + A||q||2}_ (48) 

This problem may be solved directly by simply providing d, A, and A to the LSQR 
package because LSQR is formulated to solve the exact problem in (4.8). Call- 
ing LSQR with these variables yields q opt , and the solution h^* is backed out via 

W- 1 / 2 opt 
tot 4 

Algorithm 4 outlines how one may iteratively apply (4.8) to attempt to find a 
solution that minimizes the original cost function, (3.7). The technique iterates until 
the objective function decreases by less than S or the maximum number of iterations, 
K, is exceeded. The initial solution estimate is obtained via pseudoinversion of Ctot 
(an all-zeros initialization would cause W to t to be poorly conditioned) . A line search is 
used to step between the prior solution estimate and the upcoming one; this improves 
the rate of convergence and ensures the objective decreases at each step. This method 
is global in the sense that all PN unknowns are being estimated concurrently per 
iteration. 

4.5. Row-by-Row Shrinkage (RBRS). The proposed IRLS technique solves 
for all PN unknowns during each iteration, but this is cumbersome when PN is large. 
An alternative approach is to apply an inner loop that fixes n and then iteratively 
tunes h„ while holding the other h m s (m ^ n) constant; thus only P (rather than 
PN) unknowns need to be solved for during each inner iteration. 

This idea inspires the RBRS algorithm. The term "row-by-row" is used because 
each h„ corresponds to row n of the G matrix in (3.3), and "shrinkage" is used because 
the £2 energy of most of the h„s will essentially be "shrunk" (to some extent) during 
each inner iteration: when A is sufficiently large and many iterations are undertaken, 
many h n s will be close to all-zeros vectors and only several will be dense and high in 
energy. 

4.5.1. RBRS for real-valued data. Assume d and the C„s of (3.7) are real- 
valued. We now seek to minimize the function by extending the single-vector sequen- 
tial shrinkage technique of [15] and making modifications to (3.7). Assume we have 
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prior estimates of hi through h N , and that we now desire to update only the jth 
vector while keeping the other N — 1 fixed. The shrinkage update of h, is achieved 



via: 



{i || [S^ =1 C„h„ - Cfo] + C,x - d\\l + A ||x|| 2 } , 



(4.9) 



where T,^ =1 C n h. n — Cjhj forms an approximation of d using the prior solution coef- 
ficients, but discards the component contributed by the original jth vector, replacing 
the latter via an updated solution vector, x. The left-hand term promotes a solution 
x that keeps residual error down, whereas the right-hand term penalizes xs that con- 
tain nonzeros. It is crucial to note that the right-hand term does not promote the 
clcment-by-element sparsity of x; rather, it penalizes the overall £2 energy of x, and 
thus both sparse and dense xs are penalized equally if their overall £ 2 energies are the 
same. 

One way to solve (4.9) is to take its derivative with respect to x T and find x such 
that the derivative equals 0. By doing this and shuffling terms, and assuming we have 
an initial estimate of x, we may solve for x iteratively: 



c;c j + 



A 



|Xi_i|| 2 + e 



Cjrj, 



(4.10) 



where = d + Cjhj — £^ =1 C„h„, I is a P x P identity matrix, and e is a small 



value that avoids ill-conditioned results. 3 By iterating on (4.10) until (4.9) changes 
by less than (5i nner , we arrive at a solution to (4.9), x opt , and this then replaces the 
prior solution vector, hj. Having completed the update of the jth vector, we proceed 
to update the rest of the vectors, looping this outer process K times or until the 
main objective function, (3.7), changes by less than <5 uter- Algorithm 5 details the 
entire procedure; unlike IRLS, here we essentially repeatedly invert P x P matrices 
to pursue a row-by-row solution, rather than PN x PN matrices to pursue a solution 
that updates all rows per iteration. 

4.5.2. Extending RBRS to complex- valued data. If (3.7) contains complex- 
valued terms, we may structure the row-by-row updates as in (4.9), but because the 
derivative of the objective function in (4.9) is more complicated due to the presence 
of complex- valued terms, the simple update equation given in (4.10) is no longer ap- 
plicable. One way to overcome this problem is to turn the complex-valued problem 
into a real- valued one. 

Let us create several real- valued expanded vectors: 



Re(d) 
Im(d) 



r>2M 



Re(h„) 
Im(h n ) 



c2P 



(4.11) 



as well as real- valued expanded matrices: 
C n = 



Re(C„) -Im(C„) 
Im(C„) Re(C„) 



p 2Mx2P 



(4.12) 



3 Eq. (4.10) consists of a direct inversion of a P X P matrix, which is acceptable in this paper 
because all experiments involve P < 10. If P is large, (4.10) could be solved via a CG technique 
(e.g., LSQR). 
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Algorithm 5 — MSSO Row-by-Row Sequential Iterative Shrinkage 

Task: Minimize || ||d — Ctothtot \\\ + A X^=i ll n ™ll 2 } using an iterative scheme when all 
data is real-valued. 

Data and Parameters: A, d, C n (n G {1, . . . , iV}), 5 utcr, dinner, K, and I are given. 

Initialize: Set k — and htot = (Ctot^d (or e.g. htot = 1), unstack into hi, ... , hjv. 

Iterate: Set k = 1 and apply: 

• Sweep over row vectors: set j = 1 and apply: 

o Optimize a row vector: set i = 1 and xo = h, and then apply: 



Cjrj, where r, = d + Cjhj — E^Lxdh,, 



• i = i + 1. Terminate when i > / or (4.9) decreases by less than <5i nner . 

o Finalize row vector update: set hj to equal the final x. 

o j = j + 1. Terminate loop when j > N. 

• k — k + 1. Terminate loop when k > K or (3.7) decreases by less than <5 utcr- 
Finalize: If A was large enough, several h n s should be dense and others close to 0. 



Due to the structure of (4.11, 4.12) and the fact that ||h„||2 equals ||h„||2, the following 
optimization is equivalent to (3.7): 




JV 



J2 c « h - 



n=l 



N 



\J2\\Kh\. (4.13) 



n=l 



This means we may apply RBRS to complex-valued scenarios by substituting the h„s 
for the h„s and C„s for the C n s in (4.9, 4.10) and Algorithm 5. [Eq. (4.10) becomes 
an applicable update equation because (4.9) will consist of only real-valued terms 
and the derivative calculated earlier is again applicable.] Finally, after running the 
algorithm to obtain finalized h„s, we may simply restructure them into complex h„s. 

4.6. Column-by-Column Shrinkage (CBCS). Here we propose a dual of 
RBRS — a technique that sequentially updates the columns of G (i.e., the g p s) in (3.1, 
3.3) rather than its rows (the h„s). Interestingly, we will show that this approach 
yields a separable optimization and reduces the overall problem to simply repeated 
element-by- element shrinkages of each g p . 

4.6.1. CBCS for real-valued data. Assume the g p s, F p s, and d in (3.3) are 
real-valued and that we have prior estimates of the g p s. Let us consider updating the 
pth vector while keeping the other P — 1 fixed. This reduces (3.3) to 

min |i ||r - F p x||^ + A VMn]) 2 + b[n] j , (4.14) 

where x will be the update of g , and r and b are as follows: 

p 

r = d + F pgp -^F 9 g 9 , (4.15) 

9=1 
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and 

p 

b[n] = -(g p [n]) 2 + ]T(g g [n]) 2 , for n = 1, . . . , N. (4.16) 

q=l 

If the h[n]s were not present, (4.14) would reduce to the standard problem iterated 
shrinkage is intended to solve [15, 16]. 

Now let us apply a proximal relaxation [5,8, 19] to (4.14) and seek a solution 
x e M. N as a shrinkage update of g p : 



1(1 



N 



F P x|| 2 + a ||x - g p \\l - ||F p (x - g p )\\ 2 2 ) V(xW) 2 + b[n] 



(4.17) 

where a is chosen such that al — FjFp is positive definite (e.g., a may be set to the 
maximum singular value of F p ). The idea here is to replace g p with the solution x 
and then iterate this procedure, repeatedly solving (4.17). This ultimately yields an 
updated x that globally minimizes (4.14) because the proximal method is guaranteed 
to arrive at a local minimum [8,19] and (4.14) itself is convex. Having obtained x, we 
perform an update, g p = x, and then repeat the overall process for the next g p , and 
so forth. Additionally, we add a layer of iteration on top of this column-by-column 
sweep, optimizing each of the P vectors a total of K times. 

The only obstacle that remains in order for us to implement the entire algorithm 
is an efficient way to solve (4.17). We pursue such an approach by first expanding the 
terms of (4.17): 

min jc + v T x + |x T x + A £j V( X W) 2 + b[n] j , (4.18) 

where c = ±r T r + § gjg p - igjFjF p g p and v = FjF p g p - ag p - Fjr. Since c is 
constant, we may ignore it in the optimization. Upon closer inspection, we see that 
(4.18) is a separable problem and that the individual scalar elements of x may be 
optimized independently For the nth element of x, (4.18) simplifies to: 

min f v[n]x[n] + ^(x[n]) 2 + Av/(x[n]) 2 + b[n]| , (4.19) 

x[n] L Z J 

Having burrowed down to an element-by-element problem, all that remains is to 
efficiently solve (4.19). One approach is to compute the derivative of its objective 
with respect to x[n] and find x[n] such that the derivative equals zero. The derivative 
equals the following nonlinear scalar equation: 

v[n] + x[n] (a + A ) . (4.20) 

Setting the derivative in (4.20) to zero and assuming we have an initial estimate of 
x[n], we may solve for x[n] iteratively as follows: 

(x[n]i) = -v[n] ( a + X \ , (4.21) 
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Algorithm 6 — MSSO Column-by-Column Sequential Iterative Shrinkage 
Task: Minimize { | |jd — Ftotgtotll2 + ^ ll^lls} wnen au data is real-valued. 
Data and Parameters: A, d, F p ,p 6 {1, ... , P}, <$tot, <5vcc, <5 c lem> J, I are given. 
Initialize: g tot = (Ftot)*d; split into g 1; . . . , g P ; set a = max. sing. val. among F p s. 

Iterate: Set k = 1 and apply: 

• Sweep over column vectors: set p — 1 and apply: 

o Optimize a column vector: set j = 1 and apply: 

• Construct r = d + F pgp - F qSq . 

. Construct b[l] = -(g p [/]) 2 + E^feJI]) 2 , for I = 1, . . . ,N. 

• Construct v = FjF p g p - ag p - Fjr. 

• Set x = g p . 

• Sweep over column elements: set n = 1 and apply: 

o Optimize nth element of x: set i = 1 and apply: 

• (x[n]i) = -v[n] fa + ^— ) . 

• i = i + 1. Stop if i > I or (4.19) decreases by less than <5 e i cm . 
o n = n + 1. Terminate when n > N. 

• Update column vector: set g p to equal the final x. 

• j — j + 1- Terminate when j > J or (4.14) decreases by less than <5 V cc- 
o p — p + 1. Terminate when p > P. 

• k — k + 1. Terminate loop when k > K or (3.3) decreases by less than 5tot- 
Finalize: If A was sufficiently large, gl , . . . , g p should be simultaneously sparse. 



where e is simply a small value that avoids ill-conditioned scenarios. 

We may now formulate CBCS as Algorithm 6. As we seek to update a fixed g 1; 
note how we iteratively tune its N elements, one at a time, via (4.21), but instead 
of moving on immediately to update g 2 , we update g l5 r, v, and b, and tune over 
the elements of g x yet again, doing this repeatedly until the per-vector objective, 
(4.14), stops decreasing — only then moving on to g 2 . Empirically, we find this greatly 
speeds up the rate at which the g p s converge to a simultaneously sparse solution, but 
unfortunately, even with this extra loop, CBCS still requires excessive iterations for 
larger problems (see Sec. 5). Similarly to RBRS in Algorithm 5, note how the inner 
loops are cut off when the objective function stops decreasing to within some small 
value 5 or some fixed number of iterations has been exceeded. 

4.6.2. Extending CBCS to complex- valued data. If (3.3) contains complex- 
valued terms, we may structure the column-by-column updates as in (4.14, 4.17), but 
the expansion and derivative of the latter equation's objective function does not lend 
itself to the simple update equations given in (4.18, 4.19, 4.21). One way to over- 
come this problem is to turn the complex-valued problem into a real-valued one. This 
approach is not equivalent to the one used to extend RBRS to complex data. 
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First we stack the target vector, d, into a real-valued vector: 

d = 



Re(d) 
Im(d) 



l>2M 



and then split, rather than stack, the unknown vectors into 2P new vectors: 



(Re) 



Re(g p ) G R N , g p Im) = Im(g p ) 6 R N , for p = 1, . . . , P. 



(4.22) 



(4.23) 



We then aggregate these vectors into G = [gf 10 ^, gi Im \ • ■ • , gj? c \ gp" 1 '] • Next, we 
split each F p into two separate matrices, for p = 1, . . . , P: 



F (A) = 

V 



Re(F p ) 
Im(F p ) 



^2MxN p(B) _ 



-Im(F p ) 
Re(F p ) 



.2MxJV 



(4.24) 



yielding 2P new real- valued matrices. 

Due to the structure of (4.22, 4.23, 4.24), the following optimization is equivalent 
to (3.3): 



d-EF p A) gr-f>fg 

p—1 p — 1 



(Im) 
V 



N 



+ A E 



n—1 \ p. 



(4.25) 



^(g p Rc '[n])2 + ^(g p Im) [n]) 2 - 



P =i 



The equivalence arises because the first and second terms of (4.25) are equivalent to 
|||d-Ftotgtotlli and ll G lls in (3.3), respectively. 

This means we may apply CBCS to complex-valued problems by performing 
column- by-column optimization over the IP real- valued unknown vectors. This works 
because CBCS will pursue solutions where the gi Re \ gi Im \ • • • , g^ e \ g P ' m ' vectors 
are simultaneously sparse, which is equivalent to pursuing simultaneously sparse 
g 1 ,...,gpS. After running CBCS on the 2P vectors, one may simply restructure 
them into P complex-valued g p s. 

Finally, let us set P = 1 and thus consider the case of single-vector sparse ap- 
proximation. The above derivations show that seeking a single sparse complex-valued 
vector is equivalent to seeking two simultaneously sparse real-valued vectors. In 
other words, single- vector sparse approximation of a complex vector readily maps to 
the MSSO problem, increasing the applicability of algorithms that solve the latter. 

4.7. Second-Order Cone Programming (SOCP). We now propose a sev- 
enth and final algorithm to solve the MSSO problem as given in (3.3). We branch 
away from the shrinkage approaches that operate on individual columns or rows of 
the G matrix and again seek to concurrently estimate all PN unknowns. Rather 
than using an IRLS technique, however, we pursue a second-order cone programming 
approach, motivated by the fact that second-order cone programs may be solved via 
efficient interior point algorithms [42,46] and are able to encapsulate conic, convex- 
quadratic [36], and linear constraints. (Quadratic programming is not an option 
because the g p s, F p s, and d may be complex.) 

Second-order conic constraints are of the form a = [ai, . . . , ajv] T such that 



,a N -i] h < a N- 



(4.26) 
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The generic format of an SOC program is 

min c T x s.t. Ax = b and x £ K, (4.27) 

X 

where K = xLi x ••■ xL^, is the A-dimensional positive orthant cone, and 
the L„s are second-order cones [36]. To convert (3.3) into the SOC format, we first 
write 

min {±s + Al T t} 

S.t. Z = dtot-Ftotgtot and Il z ll2 < s 

and ||[Re(g 1 [n]),Im(g 1 [n]), . . . , Re(g P [n]), Im(g P [n])] T || 2 < t 

where n £ {1, . . . , N} and t — [ti, . . . , ijv] T - The splitting of the complex elements of 
the g p s mimics the approach used when extending CBCS to complex data, and (4.28) 
makes the objective function linear, as required. Finally, in order to represent the 
||z||| < s inequality in terms of second-order cones, an additional step is needed. Given 
that s = j(s + l) 2 — j(s — l) 2 , the inequality may be rewritten as z H z + j(s — l) 2 < 
j(s+l) 2 and then expressed as a conic constraint: ||[z T , |(s— 1)] T ||2 < |(s+l) [32,36]. 
Applying these changes yields 

min {|s + Al T t} 

s.t. z = d to t -Ftotgtot and ll[ zT I '"] T ||2 < v, 
u=(s- l)/2, v = (s + l)/2, s > 0, 
and ||[Re(g 1 [n]),Im(g 1 [n]), . . . , Re(g P [n]), Im(gp[n])] T || 2 <t 

which is a fully-defined SOC program that may be implemented and solved numeri- 
cally. There is no Algorithm pseudocode for this technique because having set up the 
variables in (4.29), one may simply plug them into an SOCP solver. In this paper we 
implement (4.29) in SeDuMi (Self-Dual-Minimization) [42], a free software package 
consisting of MATLAB and C routines. 

5. Experiments and Results. Our motivation for solving MSSO sparse ap- 
proximation problems comes from MRI RF excitation pulse design. Due to the NP- 
hardness of the problem (3.2), there is no reasonable way to check the accuracy of 
approximate solutions to these problem instances obtained with the algorithms intro- 
duced here. Thus, before turning to the MRI RF excitation pulse design problem in 
Sec. 5.3, we present several synthetic experiments. These experiments allow compar- 
isons amongst algorithms and also empirically reveal some properties of the relaxation 
(3.3). Theoretical exploration of this relaxation is also merited but is beyond the scope 
of this manuscript. 

All experiments are performed on a Linux server with a 3.0-GHz Intel Pentium 
IV processor. The system has 16 gigabytes of random access memory, ample to ensure 
that none of the algorithms require the use of virtual memory and to avoid excessive 
hard drive paging. MP, LSMP, IRLS, RBRS, CBCS are implemented in MATLAB, 
whereas SOCP is implemented in SeDuMi. The runtime of any method could be 
reduced significantly by implementing it in a completely compiled format such as C. 
Note: OMP is not evaluated because its performance always falls in between that of 
MP and LSMP. 

5.1. Sparsity Profile Estimation in a Noiseless Setting. 



(4.28) 



(4.29) 
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5.1.1. Overview. We now evaluate how well the algorithms of Sec. 4 estimate 
sparsity profiles when the underlying g p s are each strictly and simultaneously K- 
sparse and the observation d of (3.1) is known exactly and not corrupted by noise. 
This corresponds to a high-SNR source localization scenario where the sparsity profile 
indicates locations of emitters and our goal is to find the locations of these emitters 
[26,30,32,33]. Our goal is to get an initial grasp of the challenges of solving the MSSO 
problem. 

We synthetically generate real- valued sets of F p s and g p s in (3.1), apply the 
algorithms, and record the fraction of correct sparsity profile entries recovered by 
each. We vary M in (3.1) to see how performance at solving the MSSO problem 
varies when the F p s are underdetermined vs. overdetermined and also vary P to see 
how rapidly performance degrades as more system matrices and vectors are employed. 

5.1.2. Details. For all trials, we fix N = 30 in (3.1) and K = 3, which means 
each g p vector consists of thirty elements, three of which are nonzero. We consider P G 
{1,2,..., 8}, and M G {10, 15, ... , 40}. For each of the fifty-six fixed (M, P) pairs, 
we create 50 random instances of (3.1). Each of the 2,800 instances is constructed 
and evaluated as follows: 

• Pick a if-element subset of {1, . . . , N} uniformly at random. This is the sparsity 
profile. 

• Create P total iV-element vectors, the g p s. The K elements of each that corre- 
spond to the sparsity profile are filled in with draws from a Gaussian ~ Af(0, 1) 
distribution; all other elements are set to zero. 

• Create P total M xN matrices, the F p s. Each element of each matrix is determined 
by drawing from Af(0, 1); each column of each matrix is normalized to have unit £2 
energy. 

• Compute d = J2p=i F P g p - Shuffle F p s and g p s into C„s and h„s via (3.4, 3.5). 

• Apply the algorithms: 

o MP, LSMP: iterate until K elements are chosen or the residual approximation is 
0. If less than K terms are chosen, this hurts the recovery score. 

o IRLS, RBRS, CBCS, SOCP: approximate a A oracle: proxy for a good choice 
of A by looping over roughly seventy As in [0,2], running the given algorithm 
each time. This sweep over A results in high-energy, dense solutions through 
negligible-energy, all-zeros solutions. For each of the estimated g tot s (that vary 
with A), estimate a sparsity profile by noting the largest £2 energy rows of the 
associated G matrix. 4 Remember the highest fraction recovered across all As. 
After performing the above steps, we average the results of the 50 trials associated with 
each fixed (M, P) to yield the average fraction of recovered sparsity profile elements. 

5.1.3. Results. Each subplot of Fig. 5.1 depicts the average fraction of recov- 
ered sparsity profile elements versus the number of knowns, M, for a fixed value 
of P, revealing how performance varies as the F p G ]R MxAr matrices go from being 
underdetermined to overdetermined. 

Recovery Trends. As the number of knowns M increases, recovery rates improve 
substantially, which is sensible. For large M and small P, the six algorithms behave 

4 For example, if the true sparsity profile is {1,2,9} and the largest £2 energy rows of G are 
{2,7,8}, then the fraction of recovered sparsity profile terms equals i. Now suppose only two rows 

of G have nonzero energy and the profile estimate is only {7, 8}. The fraction recovered is now zero. 
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Fig. 5.1. Sparsity Profile Estimation in a Noiseless Setting. Subplots depict average 
fraction of sparsity profile elements recovered over 50 trials of six algorithms as M is varied. P is 
fixed per subplot, and N = 30 and K = 3 for all trials. Data is generated as described in Sec. 5.1.2. 
Recovery scores for IRLS, RBRS, CBCS, and SOCP assume a good choice of X is known. For large 
M, all algorithms exhibit high recovery rates; for large P, small M, or both, the algorithms that seek 
to minimize (3.3, 3.7) generally outperform those that greedily pursue a solution. 



similarly, consistently achieving nearly 100% recovery. For large P and moderate M, 
however, sparsity profile recovery rates are dismal — as P increases, the underlying 
simultaneous sparsity of the g p s is not enough to combat the increasing number of 
unknowns, PN. As M is decreased and especially when P is increased, the per- 
formance of the greedy techniques falls off relative to that of IRLS, RBRS, CBCS, 
and SOCP, showing that the convex relaxation approach itself is a sensible way to 
approximately solve the formal NP-Hard combinatorial MSSO simultaneous sparsity 
problem. Furthermore, the behavior of the convex algorithms relative to the greedy 
ones coincides with the studies of greedy vs. convex programming sparse approxima- 
tion methods in single- vector [4,6] and SSMO contexts [7]. Essentially, in contrast 
with convex programming techniques, the greedy algorithms only look ahead by one 
term, cannot backtrack on sparsity profile element choices, and do not consider up- 
dating multiple rows of unknowns of the G matrix at the same time. LSMP tends to 
perform slightly better than MP because it solves a least squares minimization and 
explicitly considers earlier chosen rows whenever it seeks to choose another row of G. 

Convergence. Across most trials, IRLS, RBRS, CBCS, and SOCP converge 
rapidly and do not exceed the maximum limit of 500 outer iterations. The excep- 
tion is CBCS when M is small and P = 8: here, the objective function frequently 
fails to decrease by less than the specified 5 — 10~ 5 . 
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(M 


P) 




Algorithm 


(10,8) 


(20,1) 


(30,5) 


(40,8) 


MP 


5.4 


1.8 


2.6 


4.0 


LSMP 


11.4 


5.6 


15.6 


27.6 


IRLS 


92.6 


10.1 


73.2 


175.0 


RBRS 


635.7 


36.0 


236.8 


401.6 


CBCS 


609.8 


7.1 


191.4 


396.3 


SOCP 


44.3 


37.0 


64.3 


106.5 



Table 5.1 



Average Algorithm Runtimes for Noiseless Sparsity Profile Estimation. For several 
fixed (M, P) pairs, each algorithm's average runtime over the corresponding 50 trials is given in units 
of milliseconds; TV = 30 and K = 3 for all trials (runtimes of the latter four algorithms are also 
averaged over the multiple A runs per trial). MP is substantially faster than the other techniques, 
as expected. For larger problems, e.g. (M,P) = (10,8), the runtimes of both RBRS and CBCS are 
excessive relative to those of the other convex minimization techniques, IRLS and SOCP. 



Runtimes. For several fixed (M,P) pairs, Table 5.1 lists the average runtimes 
of each algorithm across the 50 trials associated with each pair. 5 For IRLS, RBRS, 
CBCS, and SOCP, runtimes are also averaged over the many A runs. Among the 
convex minimization methods, SOCP seems superior given its fast runtimes in three 
out of four cases. Peak memory usage is not tracked here because it is difficult to do 
so when using MATLAB for such small problems; it will be tracked during the third 
experiment where the system matrices are vastly larger and differences in memory 
usage across the six algorithms are readily apparent. 

Closer Look: Solution Vectors. We now observe how the algorithms that seek to 
minimize the convex objective behave during the 43rd trial when K = 3, N = 30, 
M = 10, and P = 1, corresponding to the base case problem of estimating one 
sparse real- valued vector, g x . Fig. 5.2 illustrates estimates obtained by SOCP, CBCS, 
RBRS, and IRLS when A = 0.03; for each algorithm, a subplot shows elements of both 
the estimated and actual g 1; and lists the estimated sparsity profile (ESP), number 
of profile terms recovered, and value of the objective function given in (3.3, 3.7). 
Although RBRS, CBCS, and SOCP yield slightly different solutions (among which 
SOCP yields the best profile estimate), they all yield an objective function equal to 
0.028 ± 10~ 5 . Convex combinations of the three solutions continue to yield the same 
value, suggesting that the three algorithms have found solutions among a convex set 
that is the global solution to the objective posed in (3.3, 3.7). Given the fact that in 
this case SOCP outperforms RBRS and CBCS, we see that even the globally optimal 
solution to the relaxed convex objective does not necessarily optimally solve the true 
iv-sparse profile recovery problem. In contrast to the other methods, IRLS yields 
a slightly higher objective function value, 0.030, and its solution vector is not part 
of the convex set — it does however correctly determine 2 of the 3 terms of the true 
sparsity profile. 

Closer Look: Objective Function Behavior. Concluding the experiment, Fig. 5.3 
plots the objective vs. A for the 25th trial when M = 30 and P = 6, studying how 
the objective (3.3, 3.7) varies with A when applying SOCP, CBCS, RBRS, and IRLS. 
For all seventy values of A £ [0, 2], SOCP, CBCS, and RBRS generate solutions that 
yield the same objective function value. For A < |, IRLS attains the same objective 
function value as the other methods, but as A increases, IRLS is unable to minimize 



5 In the interest of space we do not list average runtimes for all fifty-six (M, P) pairs. 
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Fig. 5.2. Noiseless Sparsity Profile Estimation with IRLS, RBRS, CBCS, SOCP. 

Here M = 10, N = 30, P = 1, and K = 3. The algorithms are applied with A fixed at 0.03 and 
attempt to estimate the single unknown vector, g 1; along with the sparsity profile. Subplots depict 
the elements of both the estimated and actual g 1( along with the estimated sparsity profile (ESP), 
number of profile terms recovered, and objective function value. SOCP leads to a superior sparsity 
profile estimate, and SOCP, RBRS, and CBCS seem to minimize the convex objective given in (3.3, 
3.7). IRLS does not, but still manages to properly identify 2 out of 3 sparsity profile terms. 



objFun vs. X (N=30, M=30, P=6, trial 25) 




Fig. 5.3. Noiseless Sparsity Profile Estimation: Objective Function Behavior. For 

the 25 th trial of the (M, P) = (30, 6) series, SOCP, CBCS, RBRS, and IRLS are used to solve (3.3, 
3.7) for 70 values of A g [0,2]; the value of the objective function vs. A is given above. For A > -j, 
IRLS's solutions do not minimize the objective as well as those produced by the three other methods. 



the objective function as well as SOCP, RBRS, and CBCS. The behavior in Fig. 5.3 
occurs consistently across the fifty trials of the other (M, P) pairs. 

5.2. Sparsity Profile Estimation in the Presence of Noise. 

5.2.1. Overview. We now evaluate how well the algorithms of Sec. 4 estimate 
sparsity profiles when the underlying g p s are each strictly and simultaneously K- 
sparse and the observation d of (3.1) is corrupted by additive white Gaussian noise. 
The signal-to-noise ratio (SNR) and K are varied across sets of Monte Carlo trials in 
order to gauge algorithm performance across many scenarios. For a given trial with 
a fixed SNR level in units of decibels (dB), the M elements of the true observation 
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vector, d true , are corrupted with independent and identically distributed (i.i.d.) zero- 
mean Gaussian noise with variance cr 2 , related to the SNR as follows: 



This noise measure is analogous to that of [7]. 

5.2.2. Details. We fix TV = 30, M = 25, and P = 3, and consider SNR G 
{-10, -5,0,..., 25, 30} and K G {1,3,5,7,9}. For each fixed (SNR, K) pair, we 
generate 100 noisy observations and apply the algorithms as follows: 

• Generate the sparsity profile, g p s, F p s, h„s, and C„s as in Sec. 5.1.2. The g p s are 
simultaneously if-sparse and all terms are real- valued. 

• Compute dt r uo = Fig! H h F P g P . 

• Construct d no ; sy = d trU o + n where n ~ Af(0, cr 2 I) and a 2 is given by (5.1). 

• Apply the algorithms by providing them with d no i sy and the system matrices: 

o MP, LSMP: iterate until K elements are chosen or the residual approximation is 
0. If less than K terms are chosen, this hurts the recovery score. 

o IRLS, RBRS, CBCS, SOCP: using a pre-determined fixed A (see below), apply 
each algorithm to obtain estimates of the unknown vectors and sparsity profiles. 
After performing the above steps, we average the results of the 100 trials associated 
with each fixed (SNR, K, alg) triplet to yield the average fraction of sparsity profile 
elements that each algorithm recovers. 

Control Parameter Selection. The A mentioned in the list above is determined as 
follows: before running the overall experiment, we generate three noisy observations 
for each (SNR, K) pair. We then apply IRLS, RBRS, CBCS, and SOCP, tuning the 
control parameter A by hand until finding a single value that produces reasonable 
solutions. All algorithms then use this hand-tuned, fixed A and are applied to the 
other 100 noisy observations associated with the (SNR, K) pair under consideration. 
Thus, in distinct contrast to the noiseless experiment, we no longer assume an ideal 
A is known for each denoising trial. 

5.2.3. Results. Each subplot of Fig. 5.4 depicts the average fraction of recov- 
ered sparsity profile elements versus SNR for a fixed K, revealing how well the six 
algorithms are able to recover the K elements of the sparsity profile amidst noise in 
the observation. Each data point is the average fraction recovered across 100 trials. 

Recovery Trends. When K = 1, we see from the upper-left subplot of Fig. 5.4 
that all algorithms have essentially equal performance for each SNR. Recovery rates 
improve substantially with increasing SNR, which is sensible. For each algorithm, 
we see across the subplots that performance generally decreases with increasing K; 
in other words, estimating a large number of sparsity profile terms is more difficult 
than estimating a small number of terms. This trend is evident even at high SNRs. 
For example, when SNR is 30 dB and K = 7, SOCP is only able to recover ~ 70% 
of sparsity profile terms. When K = 9, the recovery rate falls to <~ 60%. For low 
SNRs, e.g., -5 dB, all algorithms tend to perform similarly, but the greedy algorithms 
perform increasingly worse than the others as K goes from moderate-to-large and SNR 
surpasses zero dB. In general, MP performs worse than LSMP, and LSMP in turn 
performs worse than IRLS, SOCP, RBRS, and CBCS, while the latter four methods 
exhibit essentially the same performance across all SNRs and Ks. For K = 3, MP's 
performance falls off relative to IRLS, SOCP, RBRS, and CBCS, but LSMP's does 




dtrucl^lO- 3 ™/ 



(5.1) 



22 



A. C. ZELINSKI ET AL. 





-socp ■ 



10 20 

SNR (dB) 

-cbcs — K-rbrs — — iris - 




-Ismp 



-mp 




S 0.6 
g>0.4 



10 20 

SNR (dB) 



10 20 
SNR (dB) 



10 20 

SNR (dB) 



Fig. 5.4. Sparsity Profile Estimation in the Presence of Noise. Each subplot depicts the 
average fraction of recovered sparsity profile elements versus SNR for a fixed K , revealing how well 
the algorithms recover the K elements of the sparsity profile amidst noise in the observation. Each 
data point is the average fraction recovered across 100 trials; data is randomly generated as described 
in Sec. 5.2.2. N,M and P are always fixed at SO, 25, and 3, respectively. For each (SNR, K) pair, 
a "good" lambda is chosen by denoising a few cases by hand and then using this fixed A for 100 
fresh denoising trials. Performance degrades with increasing K and decreasing SNR. For large K , 
the greedy algorithms perform worse than IRLS, SOCP, RBRS, and CBCS, whereas the latter four 
methods perform essentially identically across all (SNR, K) combinations. 



not. As K transitions from 3 to 5, however, LSMP performs as badly as MP at low 
SNRs, but its performance picks up as SNR increases. As K continues to increase 
beyond 5, LSMP's performance is unable to surpass that of MP, even when SNR is 
large. Overall, Fig. 5.4 shows that convex programming algorithms are superior to 
greedy methods when estimating sparsity profiles in noisy situations; this coincides 
with data collected in the noiseless experiment in Sec. 5.1, as well as the empirical 
findings of [6, 7]. 

Convergence. For many denoising trials, CBCS typically requires more iterations 
than the other techniques in order to converge. At times, it fails to converge to within 
the specified 5 = 10~ 5 , similarly to how it behaves during the noiseless experiment of 
Sec. 5.1. 

Runtimes. Across all denoising trials, MP, LSMP, IRLS, RBRS, CBCS, SOCP 
have average runtimes of 3.1, 25.1, 57.2, 247.0, 148.5, and 49.2 milliseconds. It seems 
SOCP is best for denoising given that it is the fastest algorithm among the four 
methods that outperforms the greedy ones. IRLS is nearly as fast as SOCP and thus 
is a close second choice for sparsity profile estimation. 

Closer Look: Mean Square Errors of Convex Minimization Methods Before and 
After Estimating the Sparsity Profile and Retuning the Solution. Now let us consider 
the 35th trial of the (SNR = dB, K = 3) pair. We do away with the fixed A 
assumption and now assume we care (to some extent) not only about estimating the 
sparsity profile, but the true solution h to t as well. To proxy for this, we study how 
the mean square errors (MSEs) of solutions generated by IRLS, SOCP, RBRS, and 
CBCS behave across A before and after identifying the sparsity profile and retuning 
the solution. Figure 5.5 depicts the results of this investigation. 
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Running each algorithm for a particular A yields a solution h tot (alg, A) . The 
left subplot simply illustrates the MSEs of the h to t(aig, A)s with respect to the true 
solution. Among SOCP, RBRS, CBCS, and IRLS, only the last is able to determine 
solutions with MSEs less than unity (consider the IRLS error curve for A > 0.3). 

Consider now retuning each of the h to t(alg, A)s as follows: unstack each into 
h„(alg, A) for n G {1, . . . , N} and then remember the K vectors whose £2 energies are 
largest, yielding an estimate of the if-elcment sparsity profile. Let these estimated 
indices be {q\, . . . , qk}- Now, generate a retimed solution by using the K matrices 
associated with the estimated sparsity profile and solving d no ; sy = [C qi , . . . , C 9JC ]x to t 
for x to t S R KP . This latter vector consists of KP elements and by unstacking it 
we obtain a retuned estimate of the h„(alg, A)s, e.g., h 91 (alg, A) equals the first K 
elements of x to t, and so forth, while the other h„(alg, A)s for n ^ {qi, . . . ,qx} are 
now simply all-zeros vectors. Reshuffling the retuned h„(alg, A)s yields g p (alg, A)s 
that are strictly and simultaneously K sparse whose weightings yield the best match 
to the noisy observation in the £2 sense. Unlike the original solution estimate, which 
is not necessarily simultaneously if -sparse, here we have enforced true simultaneous 
if-sparsity. We may or may not have improved the MSE with respect to the true 
solution: for example, if we have grossly miscalculated the sparsity profile, the MSE 
of the retuned solution is likely to increase substantially, but if we have estimated the 
true sparsity profile exactly, then the retuned solution will likely be quite close (in 
the £2 sense) to the true solution, and MSE will thus decrease. 

The MSEs of these retuned solutions with respect to the true h to t are plotted in 
the right subplot of Fig. 5.5. For all algorithms and As, MSE has increased relative to 
the left subplot, which means that in every case our estimate of the true underlying 
solution has worsened. This occurs because across all algorithms and As in Fig. 5.5, 
the true K-teiuv sparsity profile is incorrectly estimated and thus the retuning step 
makes the estimated solution worse. The lesson here is that if one is interested in 
minimizing MSE in low-to-moderate SNR regimes it may be best to simply keep the 
original estimate of the solution rather than detect the sparsity profile and rctune the 
result. If one is not certain that the sparsity profile estimate is accurate, retuning is 
likely to increase MSE by fitting the estimated solution weights to an incorrect set of 
generating matrices. On the other hand, if one is confident that the entire sparsity 
profile will be correctly identified with sufficiently high probability, retuning will be 
beneficial; see [18,20,24] for related ideas. 

5.3. MRI RF Excitation Pulse Design. 

5.3.1. Overview. For the final experiment we study how well the six algorithms 
design MRI RF excitation pulses. In the interest of space and because the conversion 
of the physical problem into an MSSO format involves MRI physics and requires 
significant background, we only briefly outline how the system matrices arise and 
why simultaneously sparse solutions are necessary. A complete formulation of the 
problem for engineers and mathematicians is given in [51]; MRI pulse designers may 
refer to [53]. 

5.3.2. Formulation. For the purposes of this paper, the design of an MRI RF 
excitation pulse reduces to the following problem: assume we are given M points 
in the 2-D spatial domain, ri, . . . ,tm, along with N points in a 2-D "Fourier-like" 
domain, ki, . . . , kAr. Each r m equals [x m , 2/m] T , a point in space, while each k n equals 
[k Xin , k y ^ n ] T , a point in the Fourier-like domain, referred to as "fc-space". The r m s 
and k„s are in units of centimeters (cm) and inverse centimeters (cm" 1 ), respectively. 
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Fig. 5.5. MSEs of Convex Minimization Methods Before and After Estimating the 
Sparsity Profile and Retiming the Solution. Here MSE vs. X is studied during the 35" 1 
trial of the (SNR = dB, K = 3) denoising series. Fixing X and applying a given algorithm 

yields the solution h(alg,X). Left subplot: MSEs of the h(alg,X)s vs. the true solution fatot- Right 
subplot: MSEs of the solution estimates after they undergo retuning to be strictly and simultaneously 
K-sparse. (Sec. 5.2.3 outlines the retuning process.) For all algorithms and As, MSE increases 
substantially relative to the left subplot. None of the methods correctly estimate the true K-term 
sparsity profile and thus the retuning step causes the estimated solution to branch further away (in 
the MSE sense) from the actual one. 



The k n s are Nyquist-spaced relative to the sampling of the r TO s and may be visualized 
as a 2-D grid located at low k x and k y frequencies (where "fc x " denotes the frequency 
domain axis that corresponds to the spatial x axis). Under reasonable assumptions, 
energy placed at one or more points in fc-space produces a pattern in the spatial 
domain; this pattern is related to the fc-space energy via a "Fourier-like" transform 
[39]. Assume we place an arbitrary complex weight g n £ C (i.e., both a magnitude 
and phase) at each of the N locations in fc-space. Let us represent these weights using 
a vector g = [gi, . . . , g N ] T £ C^. In an ideal (i.e., physically-unrealizable) setting, 
the following holds: 

m = Ag, (5.2) 

where A £ C MxW is a known dense Fourier matrix 6 and the mth element of m £ C M 
is the image that arises at r TO , denoted m(r m ), due to the energy deposition along 
the N points on the fc-space grid as described by the weights in the g vector. 

The goal now is to form a desired (possibly complex- valued) spatial-domain image 
d(r) at the given set of spatial domain coordinates (the r m s) by placing energy at 
some of the given k n locations while obeying a special constraint on how the energy 
is deposited. To produce the spatial-domain image, we will use a "P-channel MRI 
parallel excitation system" [29,41] — each of the system's P channels is able to deposit 
energy of varying magnitudes and phases at the fc-space locations and is able to 
influence the resulting spatial-domain pattern m(r) to some extent. Each channel p 
has a known "profile" across space, S p (r) £ C, that describes how the channel is able to 
influence the magnitude and phase of the resulting image at different spatial locations. 
For example, if S l 3(r 5 ) = 0, then the 3rd channel is unable to influence the image that 
arises at location r5, regardless of how much energy it deposits along ki,...,kjv. 
The special constraint mentioned above is as follows: the system's channels may only 

6 Formally, A(m,n) = j^ye 1 *™'^-™ , where j = and 7 is a known lumped gain constant. 
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visit a small number of points in k-space — they may only deposit energy at K <C N 
locations. 

We now finalize the formulation of problem: first, we construct P diagonal ma- 
trices S p E C MxM such that S p (m, m) — S p (r m ),m = 1,...,M. Now we assume 
that each channel deposits arbitrary energies at each of the ./V points in fc-space and 
describe the weighting of the fc-space grid by the pth channel with the vector g p E C N . 
Based on the physics of the P-channel parallel excitation system, the overall image 
m(r) that forms is the superposition of the profile-scaled subimages produced by each 
channel: 

m = SiAg! H h SpAgp 

= Fi gl + --- + F Pgp (5.3) 

= Ftotgtot! 

where m = [m(ri), . . . ,m(rM)] T . Essentially, (5.3) is the real-world version of (5.2) 
for P-channel systems with profiles S p (r) that are not constant across r. 

Recalling that our overall goal is to deposit energy in fc-space to produce the 
image d(r), and given the special constraint that we may only deposit energy among 
a small subset of the TV points in fc-space, we arrive at the following problem: 

min ||d — m|| 2 s.t. the simultaneous if-sparsity of the g s, (5.4) 

Si>--->gp 

where d E C M = [d(rr), . . . , d(r M )] T £ C M and m is given by (5.3). That is, we seek 
out K < N locations in fc-space at which to deposit energy to produce an image m(r) 
that is close in the ii sense to the desired image d(r). Strictly and simultaneously 
if-sparse g p s are the only valid solutions to the problem. 

One sees that (5.4) is precisely the MSSO system given in (3.2) and thus the 
algorithms posed in Sec. 4 are applicable to the pulse design problem. In order to 
apply the convex minimization techniques (IRLS, SOCP, RBRS, and CBCS) to this 
problem, the only additional step needed is to retune any given solution estimate 
g tot (alg, A) into a strictly and simultaneously if -sparse set of vectors; this retuning 
step is computationally tractable and was described in Sec. 5.2.3's "Closer Look' 
subsection. 

Aside. An alternative approach to decide where to place energy at K locations 
in fc-space is to compute the Fourier transform of d(r) and decide to place energy 
at (k x ,ky) frequencies where the Fourier coefficients are largest in magnitude [50]. 
This method does yield valid if-sparse energy placement patterns, but empirically it 
is always outperformed by the convex minimization approaches [51-53] so we do not 
delve into the Fourier-based method in this paper. 

5.3.3. Experimental Setup. Using an eight-channel system (i.e., P = 8) whose 
profile magnitudes (the S p (r)s) are depicted in Fig. 5.6, we will design pulses to 
produce the desired complex-valued image shown in the left subplot of Fig. 5.7. We 
sample the spatial (x, y) domain at M = 356 locations within the region where at 
least one of the profiles in Fig. 5.6 is active — this region of interest is referred to as 
the field of excitation (FOX) in MRI literature. 7 The spatial samples are spaced by 
0.8 cm along each axis and the FOX has a diameter of roughly 20 cm. Given our 

7 Sampling points outside of the FOX where no profile has influence is unnecessary because an 
image can never be formed at these points no matter how much energy any given channel places 
throughout fc-space. 
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Fig. 5.6. Profile Magnitudes of the Eight-Channel Parallel Excitation MRI System. 

Here the magnitudes of the S p (r)s are depicted for p = 1, . . . , 8; 356 samples of each S p (r) are taken 
within the nonzero region of influence and stacked into the diagonal matrix S p used in (5.3). Across 
space, the S v (r)s are not orthogonal — their regions of influence overlap each other to some extent. 
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Fig. 5.7. Desired Image and fe-Space Grid. Left image: desired complex-valued image, d(r). 
Medium- gray region = FOX; other regions denote locations where we want image to be nonzero with 
the given magnitudes and phases. Sampling d(r) at the 356 locations within the FOX allows us to 
construct d in ( 5.3). Right subplot: 15 X 15 grid of N = 225 candidate k-space locations, ki , . . . , k225, 
at which the P channels may deposit energy and thus influence the resulting image. The physical 
constraints of the MRI excitation process force us to place energy at only a small number of grid 
locations. 



choice of r 1; . . . ,r 356 , we sample the S(t)s and d(r) and construct the S p s and d. 
Next, we define a grid of N — 225 points in (k x , fc y )-space that is f 5 x f5 in size and 
extends outward from the /c-space origin. The points are spaced by cm -1 along 
each A-space axis and the overall grid is shown in the right subplot of Fig. 5.7. Finally, 
because we know the 356 r m s and 225 k„s, we construct the 356 x 225 matrix A in 
(5.2, 5.3) along with the F p s in (5.3). We now have all the data we need to apply the 
algorithms and determine simultaneously if-sparse g p s that (approximately) solve 
(5.4). 

We apply the algorithms and evaluate designs where the use of K £ {1, . . . , 30} 
candidate points in fc-space is permitted (in practical MRI scenarios, K up to 30 is 
permissible). Typically, the smallest K possible that produces a version of d(r) to 
within some .^-fidelity is the design that the MRI pulse designer will use on a real 
system. 

To obtain simultaneously i^-sparse solutions with MP and LSMP, we set K = 30, 
run each algorithm once, remember the ordered list of chosen indices, and back out 
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Fig. 5.8. MRI Pulse Design Results. Left subplot: £2 error vs. K is given for MP, LSMP, 
IRLS, RBRS, CBCS, and SOCP. For fixed K , LSMP consistently outperforms the other algorithms. 
Right subplot: objective function values vs. A when SOCP, CBCS, RBRS, and IRLS attempt to 
minimize (3.3, 3.7). SOCP and IRLS converge and seem to minimize the objective; RBRS does 
so as well for most Xs. CBCS routinely fails to converge even after 10,000 iterations and thus its 
solutions yield extremely large objective function values. 



every solution for K = 1 through K = 30 via the retuning technique of Sec. 5.2.3. 

For each convex minimization method (IRLS, SOCP, RBRS, and CBCS), we apply 
the following procedure: first, we run the algorithm for 14 values of A G [0, ^1 , storing 
each resulting solution, g tot (alg, A). Then for fixed K, to determine a simultaneously 
if-sparse deposition of energy on the fc-space grid, we apply the retuning process 
of Sec. 5.2.3 to each of the 14 solutions, obtaining 14 strictly simultaneously K- 
sparse retuned sets of solution vectors, g^(alg, A). The one solution among the 
14 that best minimizes the £2 error between the desired and resulting images, ||d — 
Ftotgt^t? (alg) ^) II2, is chosen as the solution for the K under consideration. Essentially, 
we again assume we know a good value for A when applying each of the convex 
minimization methods. To attempt to avoid convergence problems, RBRS and CBCS 
are permitted 5,000 and 10,000 maximum outer iterations, respectively (see below). 

5.3.4. Results. Image £2 Error vs. Number of Energy Depositions in k-Space. 
Figure 5.8's left subplot shows the £2 error versus K curves for each algorithm. As 
K is increased, each method produces images with lower £2 error, which is sensible: 
depositing energy at more locations in fc-space gives each technique more degrees of 
freedom with which to form the image. In contrast to the sparsity profile estima- 
tion experiments in Sec. 5.1 and Sec. 5.2, however, here LSMP is the best algorithm: 
for each fixed K considered in Fig. 5.8, the LSMP technique yields a simultaneously 
iiT-sparse energy deposition that produces a higher-fidelity image than all other tech- 
niques. For example, when K = 17 LSMP yields a solution that leads to an image 
with £2 error of 3.3. In order to produce an image with equal or better fidelity, IRLS, 
RBRS, and SOCP need to deposit energy at K = 21 points in fc-space, and thus pro- 
duce less useful designs from an MRI perspective. CBCS fares the worst, needing to 
deposit energy at K = 25 grid points in order to compete with the fidelity of LSMP's 
K = 17 image. 

Closer Look: Objective Function vs. A. The right subplot of Fig 5.8 shows how 
well the four convex minimization algorithms minimize the objective function (3.3, 
3.7) before retuning any solutions and enforcing strict simultaneous -ftT-sparsity. For 
each fixed A, SOCP and IRLS find solutions that yield the same objective function 
value. RBRS's solutions generally yield objective function values equal to those of 
SOCP and IRLS, but at times lead to higher values: in these cases RBRS almost 



28 



A. C. ZELINSKI ET AL. 



objFun vs. iterations, X = 0.025 




220 r- 

200 

180 
i 160 
f 140 

120 

100 



rbrs: objFun vs. iterations, A. = 0.025 



50 100 

iteration # 



rbrs: objFun vs. iterations, X = 0.025 



50 100 
iteration # 



220 p 

200 

180 

S 160 
U. 

I 140 
120 
100 



cbcs: objFun vs. iterations, X = 0.025 



2000 4000 6000 8000 10000 
iteration # 



cbcs: objFun vs. iterations, X = 0.025 
15000 r 




Fig. 5.9. Convergence Behavior of IRLS, SOCP, RBRS, CBCS. Each subplot illustrates 
the value of each algorithm's objective function (3.3, 3.7) as the algorithm iterates. Upper row 
subplots are scaled to have the same y axis, whereas the bottom row subplots are "zoomed out" to 
illustrate the overall behavior of RBRS and CBCS. IRLS and SOCP converge rapidly, within 4 and 
19 iterations, respectively. RBRS and CBCS require roughly 150 and 10,000 iterations, respectively. 
The runtimes of IRLS, SOCP, RBRS, and CBCS in this case are 29, 121, 450, and 5,542 seconds. 



converges but does not do so completely. Finally, for most As CBCS's solutions yield 
extremely large objective function values; in these cases CBCS completely fails to 
converge. 

Closer Look: Objective Function Convergence for A = 0.025. The right subplot of 
Fig 5.8 shows that for A = 0.025, IRLS, SOCP, RBRS, and CBCS generate solutions 
that yield the same objective function value, suggesting that each method succeeds at 
minimizing the objective function. Figure 5.9 illustrates how the algorithms converge 
in this specific case: each subplot tracks the value of an algorithm's objective function 
as it iterates. Subplots along the top row all have the same y axis, giving a close look at 
how the algorithms behave. The two subplots along the bottom row "zoom out" along 
the y axis to show RBRS's and CBCS's total behavior. IRLS and SOCP converge 
rapidly, within 4 and 19 iterations, respectively. RBRS requires roughly 150 outer 
iterations, while CBCS requires nearly 10,000. 

Runtimes and Peak Memory Usage. Setting K — 30, we run MP and LSMP 
and record the runtime of each. Across the 14 As, IRLS, RBRS, CBCS, and SOCP's 
runtimes are recorded and averaged. The peak memory usage of each algorithm is also 
noted; these statistics arc presented in Table 5.2. In distinct contrast to the smaller- 
variable-size experiments in Sec. 5.1 and Sec. 5.2 where all four convex minimization 
methods have relatively short runtimes (under one second), here RBRS and CBCS 
are much slower, leaving IRLS and SOCP as the only feasible techniques among the 
four. Furthermore, while LSMP does indeed outperform IRLS and SOCP on an £2 
error basis (as shown in Fig. 5.8), the runtime statistics here show that LSMP requires 
ordcr-of-magnitude greater runtime to solve the problem — therefore, in some real-life 
scenarios where designing pulses in less than 5 minutes is a necessity, IRLS and SOCP 
are superior. Finally, in contrast to Sec. 5.1's runtimes given in Table 5.1, here IRLS 
is 1.9 times faster than SOCP and uses 1.4 times less peak memory, making it the 
superior technique for MRI pulse design since IRLS's £2 error performance and ability 
to minimize the objective function (3.3, 3.7) essentially equal that of SOCP. 
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Algorithm 


Runtime 


Peak Memory Usage (MB) 


MP 


11 sec 


704 


LSMP 


46 min 


304 


IRLS 


50 sec 


320 


RBRS 


87 min 


320 


CBCS 


3.3 hr 


320 


SOCP 


96 sec 


432 



Table 5.2 



Algorithm Runtimes and Peak Memory Usage for MRI Pulse Design. Each algo- 
rithm's runtime and peak memory usage is listed. The runtimes of the latter four algorithms are 
averaged over the fourteen As per trial. MP is again faster than the other techniques, but consumes 
more memory because of its precomputation step (see Algorithm 1). IRLS and SOCP are quite 
similar performance-wise and minimize the convex objective function equally well (see Fig. 5.8), but 
we see here that IRLS is approximately 1.9 times faster and uses 1.4 times less peak memory than 
SOCP, making the former the superior technique among the convex methods. 



Closer Look: Images and Chosen k-Space Locations for K = 17. To conclude 
the experiment, we fix K = 17 and observe the images produced by the algorithms 
along with the points at which each algorithm chooses to deposit energy along the 
grid of candidate points in (k x , fcj,)-space. Figure 5.10 illustrates the images (in both 
magnitude and phase) that arise due to each algorithm's simultaneously 17-sparse set 
of solution vectors, 8 while Fig. 5.11 depicts the placement pattern chosen by each 
method. From Fig. 5.10, we see that each algorithm forms a high-fidelity version of 
the desired image d(r) given in the left subplot of Fig. 5.7, but among the images, 
LSMP's most accurately represents d(r) (e.g., consider the sharp edges of the LSMP 
image's rectangular box). MP's and CBCS's images are noticeably fuzzy relative to 
the others. The placements in Fig. 5.11 give insight into these performance differences. 
Here, LSMP places energy at several higher frequencies along the k y and k x axes, 
which ensures the resulting rectangle is narrow with sharp edges along the spatial y 
and x axes. In contrast, CBCS fails to place energy at moderate-to-high (k x , fcj,)-space 
frequencies and thus cannot produce a rectangle with desirable sharp edges, while MP 
branches out to some extent but fails to utilize high k y frequencies. IRLS, RBRS, 
and SOCP branch out to higher k y frequencies but not to high k x frequencies, and 
thus their associated rectangles in Fig. 5.10 are sharp along the y axis but exhibit 
less distinct transitions (more fuzziness) along the spatial x axis. In general, each 
algorithm has determined 17 locations at which to place energy that yield a fairly 
good image and each has avoided the computationally impossible scenario of searching 
over all iV-choose-X (225-choose-17) possible placements. 

6. Discussion. 

6.1. MRI Pulse Design vs. Denoising and Source Localization. The MRI 

pulse design problem in Sec. 5.3 differs substantially from the source localization prob- 
lem in Sec. 5.1, the denoising experiment in Sec. 5.2, and other routine applications 
of sparse approximation (e.g. [4,6,7,11,16,20,33]). It differs not only in purpose but 
in numerical properties, the latter of which are summarized in Table 6.1. While this 
list will not always hold true on an application-by-application basis, it does highlight 
general differences between the two problem classes. 



8 Each image is generated by taking the corresponding solution gtoti computing m in (5.3), 
unstacking the elements of m into m(r), and then displaying the magnitude and phase of m(r). 
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MRI Pulse Design 


Denoising and Source Localization 


• F p s overdetermined 

• No concept of noise: given d is dtruo 

• Sweep over A useful 

• Metric: ||d - m|| 2 


• F p s underdetermined 

• Noisy: given d is not dtrue 

• Ideal A unknown 

• Metrics: ||g tot - g tot || 2 , and/or 
fraction of rec. sparsity profile terms 



Table 6.1 

Unique Trends of the MRI Pulse Design Problem. This table highlights differences 
between the MRI problem and standard denoising and source localization applications. Items here 
will not always be true, instead providing general highlights about each problem class. 



6.2. Merits of Row-by-Row and Column-by-Column Shrinkage. Even 
though LSMP, IRLS, and SOCP tend to exhibit superior performance across different 
experiments in this manuscript, RBRS and CBCS are worthwhile because unlike the 
former methods that update all PN unknowns concurrently, the shrinkage techniques 
update only a subset of the total variables during each iteration. 

For example, RBRS as given in Algorithm 5 updates only P unknowns at once, 
while CBCS as given in Algorithm 6 updates but a single scalar at a time. RBRS 
and CBCS are thus applicable in scenarios where P and N are exceedingly large and 
tuning all PN parameters during each iteration is not possible. If storing and handling 
M x PN or PN x PN matrices exceeds a system's available memory and causes disk 
thrashing, RBRS and CBCS, though they require far more iterations, might still be 
better options than LSMP, IRLS, and SOCP in terms of runtime. 

6.3. Future Work. 

6.3.1. Efficient Automated Control Parameter Selection. A fast tech- 
nique for finding ideal values of A is an open problem. It might be useful to inves- 
tigate several approaches to automated selection such as the "L-curve" method [25], 
universal parameter selection [14], and min-max parameter selection [27]. 

6.3.2. Runtime, Memory, and Complexity Reduction. As noted in Sec. 4, 
LSMP's computation and runtime could be improved upon by extending the projec- 
tion based recursive updating schemes of [6,7] to MSSO LSMP. In regards to the MRI 
design problem, runtime for any method might be reduced via a multi-resolution ap- 
proach (as in [33]) by first running the algorithm with a coarse fc-space frequency grid, 
noting which energy deposition locations are revealed, and then running the algorithm 
with a grid that is finely sampled around the locations suggested by the coarse result. 
This is faster than providing the algorithm a large, finely-sampled grid and attempt- 
ing to solve the sparse energy deposition task in one step. An initial investigation has 
shown that reducing the maximum frequency extent of the fc-space grid (and thus the 
number of grid points, N) may also reduce runtime without significantly impacting 
image fidelity [53]. 

6.3.3. Shrinkage Algorithm Convergence Improvements. When solving 
the MRI pulse design problem, both RBRS and CBCS required excessive iterations 
and hence exhibited lengthy runtimes. The latter was especially problematic as illus- 
trated in Fig. 5.9. To mitigate these problems, one may consider extending parallel 
coordinate descent (PCD) shrinkage techniques used for single-system single-output 
sparse approximation (as in [15,16]). Sequential subspacc optimization (SESOP) [17] 
might also be employed to reduce runtime. Combining PCD with SESOP and adding 
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a line search after each iteration would yield sophisticated versions of RBRS and 
CBCS. 

6.3.4. Multiple-System Multiple-Output (MSMO) Simultaneous Sparse 
Approximation. In the future it may be useful to consider a combined problem 
where there are multiple observations as well as multiple system matrices. That is, 
assume we have a series of J observations, di, . . . , dj, each of which arises due to a 
set of P simultaneously if-sparse unknown vectors g x •, . . . , gpj 9 passing through a 
set of P system matrices Fij, . . . , Fpj and then undergoing linear combination, as 
follows: 

p 

d = F U«ij + ' ' ' + F P,jSp,j = F P'i%P,3 f or j = 1, • • • , J- (6.1) 

P =i 

If F p j is constant for all J observations then the problem reduces to 

dj = F 1 g 1 j + • • • + F P g P j = F tot g tot)j for j = 1, . . . , J, (6.2) 

and we may stack the matrices and terms as follows: 

[di, . . . , d, 7 ] = F tot [g tot;1 , . . . , g tot)J ] . (6.3) 

Having posed (6.1, 6.2, 6.3), one may formulate optimization problems similar to (2.5, 
3.3) to determine simultaneously sparse g p jS tna t solve (6.3). Algorithms to solve 
such problems may arise by combining the concepts of SSMO algorithms [7,33,47,49] 
with those of the MSSO algorithms posed earlier. 

7. Conclusion. We defined the linear inverse multiple-system, single-output 
(MSSO) simultaneous sparsity problem where simultaneously sparse sets of unknown 
vectors are required as the solution. This problem differed from prior problems involv- 
ing multiple unknown vectors because in this case, each unknown vector was passed 
through a different system matrix and the outputs of the various matrices underwent 
linear combination, yielding only one observation vector. 

To solve the proposed MSSO problem, we formulated three greedy techniques, 
matching pursuit, orthogonal matching pursuit, and least squares matching pursuit, 
along with algorithms based on iteratively reweighted least squares, iterative shrink- 
age, and second-order cone programming methodologies. The MSSO algorithms were 
evaluated across noiseless and noisy sparsity profile estimation experiments as well as 
a magnetic resonance imaging pulse design experiment; for sparsity profile recovery, 
algorithms that minimized the relaxed convex objective function outperformed the 
greedy methods, whereas in the noiseless magnetic resonance imagine pulse design 
experiment, greedy LSMP exhibited superior performance. 

Finally, when deriving CBCS for complex-valued data, we proved that seeking a 
single sparse complex- valued vector is equivalent to seeking two simultaneously sparse 
real-valued vectors — we mapped single- vector sparse approximation of a complex vec- 
tor to the MSSO problem, increasing the applicability of algorithms that solve the 
latter. 

Overall, while improvements upon these seven algorithms (and new algorithms 
altogether) surely do exist, this manuscript has laid the groundwork of the MSSO 
problem and conducted an initial exploration of algorithms with which to solve it. 



9 The K-term simultaneous sparsity profile of each set of g p jS may or may not change with j. 
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Fig. 5.10. MRI Pulse Design: Images per Algorithm for K = 17. Each algorithm is used 
to solve the MRI pulse design problem using 17 energy depositions along the k-space grid, attempting 
to produce an image close to the desired one, d(r), given in the left subplot of Fig. 5.7. From each set 
of simultaneously 17-sparse solution vectors, we calculate the resulting image via (5.3) and display 
both its magnitude and phase. LSMP's image best resembles the desired one; IRLS's, RBRS's, and 
SO CP's images are nearly as accurate; MP's and CBCS's images lack crisp edges, coinciding with 
their larger £2 errors. 
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Fig. 5.11. MRI Pulse Design: Energy Deposition Patterns per Algorithm for K = 17. 

Each algorithm's placement of energy in k-space is displayed. LSMP branches out to moderate k x 
freguencies and high k y freguencies, partly explaining the superiority of its image in Fig. 5.10. 
IRLS, RBRS, and SOCP succeed in branching out to higher k y frequencies but do not place energy 
at \k x \ "S> 0. MP and CBCS fail to spread their energy to high spatial frequencies, and thus their 
images in Fig. 5.10 lack distinct edges and appear as "low-pass filtered" versions of d(r). 



